Java 容器源码分析之 ArrayList

概览

ArrayList是最常使用的集合类之一了。在JDK文档中对ArrayList的描述是：ArrayList是对list接口的一种基于可变数组的实现。ArrayList类的声明如下：

1 2	public class ArrayList<E> extends AbstractList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable

ArrayList继承了AbstractList抽象类，并实现了List，RandomAccess，Cloneable以及Serializable接口。对 RandomAccess 接口的实现表明支持随机访问（因为基于数组嘛~），同Cloneable接口和Serializable接口一样，该接口只是一个标记，不需要实现任何方法。ArrayList 可以支持值为 null 的元素。

本文中的分析都是针对JDK8中的源码进行的。

底层结构

从文档中的说明可以知道，ArrayList的底层是基于数组来实现的。那我们就先来看一下ArrayList的成员变量：

private static final long serialVersionUID = 8683452581122892189L;

private static final int DEFAULT_CAPACITY = 10;
private static final Object[] EMPTY_ELEMENTDATA = {};
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

//The maximum size of array to allocate.
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

transient Object[] elementData;
private int size;

使用了一个 Object 数组来存放数据，并维护一个计对数器来记录当前容器中元素的数量。注意到数组 elementData 是使用 transient 来修饰的，在后面会此进行进行解释。

除此以外，在 ArrayList 还有一个继承自父类 AbstractList 的成员变量 modCount 需要关注。使用 modCount 记录列表发生结构化修改的次数，从而提供 fail-fast 的迭代器。因为 ArrayList 的实现是非同步的，如果在迭代过程中另一个线程向同一个容器中添加元素或移除元素，就会导致ConcurrentModificationExceptions。

1 2	//The number of times this list has been structurally modified. protected transient int modCount = 0;

初始化

/**
 * Constructs an empty list with the specified initial capacity.
 */
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

/**
 * Constructs a list containing the elements of the specified
 * collection, in the order they are returned by the collection's
 * iterator.
 */
public ArrayList(Collection<? extends E> c) {
    elementData = c.toArray();
    if ((size = elementData.length) != 0) {
        // c.toArray might (incorrectly) not return Object[]
        if (elementData.getClass() != Object[].class)
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        // replace with empty array.
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

ArrayList 类提供了三个构造方法，如上所示。除了初始化一个空的ArrayList以外，还支持使用另外一个容器中的元素来初始化ArrayList。注意到，在初始化一个空的ArrayList时，如果不指定容量的大小，默认容量是10。在初始化一个空的ArrayList时，如果指定容量为0，则数组引用指向的是一个静态成员变量EMPTY_ELEMENTDATA；如果使用默认容量，则数组引用指向的是一个静态成员变量DEFAULTCAPACITY_EMPTY_ELEMENTDATA；除此以外，按照实际指定的容量分配数组空间。

扩容

ArrayList既然是基于可变数组的，那么在底层数组的存储容量不足时肯定会进行扩容操作，以改变容器的容量。扩容的操作是通过下面的代码进行实现的：

/**
 * Increases the capacity of this <tt>ArrayList</tt> instance, if
 * necessary, to ensure that it can hold at least the number of elements
 * specified by the minimum capacity argument.
 */
public void ensureCapacity(int minCapacity) {
    int minExpand = (elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
        // any size if not default element table
        ? 0
        // larger than default for default empty table. It's already
        // supposed to be at default size.
        : DEFAULT_CAPACITY;

    if (minCapacity > minExpand) {
        ensureExplicitCapacity(minCapacity);
    }
}

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 */
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

public int size() {
    return size;
}

public boolean isEmpty() {
    return size == 0;
}

这一段代码的注释很清楚了，大致解释一下：ensureCapacity方法可供外部调用，而ensureCapacityInternal则仅供内部调用，都是要确保当前容器能够容纳给定数量的元素，它们都会调用ensureExplicitCapacity方法；在每次调用ensureExplicitCapacity方法时，会将modCount 的值加1，表明 ArrayList 发生了结构化的修改，然后根据当前数组能容纳的元素数量来决定是否需要调用grow方法来调整数组的大小；grow方法负责调整数组的大小，注意每次调整时将容量扩大为当前容量的1.5倍（oldCapacity + (oldCapacity >> 1)），如果还是不能满足容量要求，就按照所需的最小容量来分配，然后将原数组中的元素复制到新数组中。ArrayList 能够支持的最大容量为 int 值的上限，超过会报OutOfMemoryError异常。

这里有一个奇怪的地方在于，modCount 的值会在 ensureExplicitCapacity 方法中加1。前面已经说过，modCount用来记录容器发生结构化修改的次数，按道理来说实在加入或移除元素是才会修改的，为什么会在这里调用呢。后面我们会看到，每次新加入元素时，ensureExplicitCapacity 都会被调用，因而可以将modCount的修改放在此方法中，就不必在 add 及 addAll 方法中进行修改了。

添加元素

/**
 * Appends the specified element to the end of this list.
 */
public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

/**
 * Inserts the specified element at the specified position in this
 * list. Shifts the element currently at that position (if any) and
 * any subsequent elements to the right (adds one to their indices).
 */
public void add(int index, E element) {
    rangeCheckForAdd(index);

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    elementData[index] = element;
    size++;
}

/**
 * Appends all of the elements in the specified collection to the end of
 * this list, in the order that they are returned by the
 * specified collection's Iterator.  The behavior of this operation is
 * undefined if the specified collection is modified while the operation
 * is in progress.  (This implies that the behavior of this call is
 * undefined if the specified collection is this list, and this
 * list is nonempty.)
 */
public boolean addAll(Collection<? extends E> c) {
    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount
    System.arraycopy(a, 0, elementData, size, numNew);
    size += numNew;
    return numNew != 0;
}

/**
 * Inserts all of the elements in the specified collection into this
 * list, starting at the specified position.  Shifts the element
 * currently at that position (if any) and any subsequent elements to
 * the right (increases their indices).  The new elements will appear
 * in the list in the order that they are returned by the
 * specified collection's iterator.
 */
public boolean addAll(int index, Collection<? extends E> c) {
    rangeCheckForAdd(index);

    Object[] a = c.toArray();
    int numNew = a.length;
    ensureCapacityInternal(size + numNew);  // Increments modCount

    int numMoved = size - index;
    if (numMoved > 0)
        System.arraycopy(elementData, index, elementData, index + numNew,
                         numMoved);

    System.arraycopy(a, 0, elementData, index, numNew);
    size += numNew;
    return numNew != 0;
}

/**
 * Checks if the given index is in range.  If not, throws an appropriate
 * runtime exception.  This method does *not* check if the index is
 * negative: It is always used immediately prior to an array access,
 * which throws an ArrayIndexOutOfBoundsException if index is negative.
 */
private void rangeCheck(int index) {
    if (index >= size)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

/**
 * A version of rangeCheck used by add and addAll.
 */
private void rangeCheckForAdd(int index) {
    if (index > size || index < 0)
        throw new IndexOutOfBoundsException(outOfBoundsMsg(index));
}

可以向ArrayList容器中添加单个元素，也可以添加一个容器；默认添加到数组的末尾，也可以添加到指定位置。首先会确认当前容量是否充裕，如果不足则会进行扩容操作。每次添加元素时都会修改modCount的值，前面已经详细地说明过了。在指定添加的位置时，会先检查指定的位置是否合理，不合理则会抛出IndexOutOfBoundsException；如果插入位置合理，则会将相应位置后面的元素向后挪以腾出空间，然后将待添加的元素放入。

移除元素

/**
 * Removes the element at the specified position in this list.
 * Shifts any subsequent elements to the left (subtracts one from their
 * indices).
 */
public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

/**
 * Removes the first occurrence of the specified element from this list,
 * if it is present.  If the list does not contain the element, it is
 * unchanged.  More formally, removes the element with the lowest index
 */
public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}

/*
 * Private remove method that skips bounds checking and does not
 * return the value removed.
 */
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work
}

/**
 * Removes all of the elements from this list.  The list will
 * be empty after this call returns.
 */
public void clear() {
    modCount++;

    // clear to let GC do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;

    size = 0;
}

/**
 * Removes from this list all of the elements whose index is between
 * {@code fromIndex}, inclusive, and {@code toIndex}, exclusive.
 * Shifts any succeeding elements to the left (reduces their index).
 * This call shortens the list by {@code (toIndex - fromIndex)} elements.
 * (If {@code toIndex==fromIndex}, this operation has no effect.)
 */
protected void removeRange(int fromIndex, int toIndex) {
    modCount++;
    int numMoved = size - toIndex;
    System.arraycopy(elementData, toIndex, elementData, fromIndex,
                     numMoved);

    // clear to let GC do its work
    int newSize = size - (toIndex-fromIndex);
    for (int i = newSize; i < size; i++) {
        elementData[i] = null;
    }
    size = newSize;
}

移除元素时其实就是使用System.arraycopy将移除后仍保留的元素复制到正确的位置上，并调整当前的size大小。注意，在元素移动完成后，要显式地将数组中不再使用的位置中存放的值赋为null，从而确保GC能够正常地回收资源。

下面再看看如何做到从ArrayList中移除指定容器内的元素以及保留指定容器中的元素。

/**
 * Removes from this list all of its elements that are contained in the
 * specified collection.
 */
public boolean removeAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, false);
}

/**
 * Retains only the elements in this list that are contained in the
 * specified collection.  In other words, removes from this list all
 * of its elements that are not contained in the specified collection.
 */
public boolean retainAll(Collection<?> c) {
    Objects.requireNonNull(c);
    return batchRemove(c, true);
}

private boolean batchRemove(Collection<?> c, boolean complement) {
    final Object[] elementData = this.elementData;
    int r = 0, w = 0;
    boolean modified = false;
    try {
        for (; r < size; r++)
            //1) 移除c中元素，complement == false
            //   若elementData[r]不在c中，则保留
            //2）保留c中元素，complement == true
            //   若elementData[r]在c中，则保留
            if (c.contains(elementData[r]) == complement)
                elementData[w++] = elementData[r];
    } finally {
        // Preserve behavioral compatibility with AbstractCollection,
        // even if c.contains() throws.
        // 1）r == size, 则操作成功了
        // 2）r != size, c.contains抛出了异常，
        //      可能是因为元素和c中元素类型不兼容，或者c不支持null元素
        //      则将后面尚未检查的元素向前复制
        if (r != size) {
            System.arraycopy(elementData, r,
                             elementData, w,
                             size - r);
            w += size - r;
        }
        if (w != size) {
            // clear to let GC do its work
            for (int i = w; i < size; i++)
                elementData[i] = null;
            modCount += size - w;
            size = w;
            modified = true;
        }
    }
    return modified;
}

我们可以看到，核心的方法在于batchRemove(Collection<?> c, boolean complement)，无论是移除给定容器中的元素removeAll(Collection<?> c)还是只保留指定容器中的元素retainAll(Collection<?> c)都是通过该方法来实现的。该方法通过传入的一个布尔类型确定ArrayList中每个元素是否应该保留，详细的注释参见上面代码中的中文注释。

上面从ArrayList中移除元素的所有方法中都没有对移除元素后的数组大小进行调整，这种情况下可能会在移除大量元素后造成空间的浪费。这时候可以通过trimToSize方法将数组大小调整为实际的大小。

/**
 * Trims the capacity of this ArrayList instance to be the
 * list's current size.  An application can use this operation to minimize
 * the storage of an ArrayList instance.
 */
public void trimToSize() {
    modCount++;
    if (size < elementData.length) {
        elementData = (size == 0)
          ? EMPTY_ELEMENTDATA
          : Arrays.copyOf(elementData, size);
    }
}

更新及查找


public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

/**
 * Returns the index of the first occurrence of the specified element
 * in this list, or -1 if this list does not contain the element.
 * More formally, returns the lowest index <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
 * or -1 if there is no such index.
 */
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

/**
 * Returns the index of the last occurrence of the specified element
 * in this list, or -1 if this list does not contain the element.
 * More formally, returns the highest index <tt>i</tt> such that
 * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
 * or -1 if there is no such index.
 */
public int lastIndexOf(Object o) {
    if (o == null) {
        for (int i = size-1; i >= 0; i--)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = size-1; i >= 0; i--)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

/**
 * Returns the element at the specified position in this list.
 */
public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

/**
 * Replaces the element at the specified position in this list with
 * the specified element.
 */
public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

基于数组的实现使得更新元素及查找元素变得比较简单。在set方法中不会修改modCount的值。

迭代

在AbstractList中其实已经提供了迭代器的一个实现，ArrayList类中又提供了一个优化后的实现。

/**
 * An optimized version of AbstractList.Itr
 */
private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;

    public boolean hasNext() {
        return cursor != size;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();

        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }

    @Override
    @SuppressWarnings("unchecked")
    public void forEachRemaining(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        final int size = ArrayList.this.size;
        int i = cursor;
        if (i >= size) {
            return;
        }
        final Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length) {
            throw new ConcurrentModificationException();
        }
        while (i != size && modCount == expectedModCount) {
            consumer.accept((E) elementData[i++]);
        }
        // update once at end of iteration to reduce heap write traffic
        cursor = i;
        lastRet = i - 1;
        checkForComodification();
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

迭代器中通过一个游标cursor来达到遍历所有元素的目的，同时还保留了上一个访问的位置以便于remove方法的实现。前面说过，ArrayList的实现并不是线程安全，其fail-fast机制的实现是通过modCount变量来实现的。在这里我们可以清楚地看到，在迭代器的next和remove等方法中，首先就会调用checkForComodification方法来判断ArrayList容器是否在迭代器创建后发生过结构上的修改，其具体的实现是通过比较创建迭代器时的modCount（即expectedModCount）和当前modCount是否相同来完成的。如果不相同，表明在此过程中其他线程修改了ArrayList（添加了或移除了元素），会抛出ConcurrentModificationException异常。

List接口还支持另一种迭代器，ListIterator<E>，不仅可以使用next()方法向前迭代，还可以使用previous()方法向后移动游标。ArrayList中也实现了listIterator()和listIterator(int index)方法，比较简单，这里就不再详细说了。

子列表

所谓的子列表，就是列表中指定范围内的一些元素，通过调用subList(int fromIndex, int toIndex)来获取。对子列表的操作会影响到父列表。通过子列表可以达到操作父列表中部分元素的目的，如只迭代部分范围内的元素，或者只对部分范围内的元素进行排序。

private class SubList extends AbstractList<E> implements RandomAccess {
    private final AbstractList<E> parent;
    private final int parentOffset;
    private final int offset;
    int size;

    SubList(AbstractList<E> parent,
            int offset, int fromIndex, int toIndex) {
        this.parent = parent;
        this.parentOffset = fromIndex;
        this.offset = offset + fromIndex;
        this.size = toIndex - fromIndex;
        this.modCount = ArrayList.this.modCount;
    }

    public boolean addAll(int index, Collection<? extends E> c) {
        rangeCheckForAdd(index);
        int cSize = c.size();
        if (cSize==0)
            return false;

        checkForComodification();
        parent.addAll(parentOffset + index, c);
        this.modCount = parent.modCount;
        this.size += cSize;
        return true;
    }

    private void checkForComodification() {
        if (ArrayList.this.modCount != this.modCount)
            throw new ConcurrentModificationException();
    }
}

上面列出了ArrayList中使用的子列表的部分代码，SubList继承了AbstractList，并实现了RandomAccess接口。SubList中并没有向ArrayList那样有一个数组来存放元素，而是持有了父列表的引用，并保存了元素相对于父列表的偏移及范围等信息。对子列表的所有操作都是通过父列表来完成的。值得说明的是，因为SubList也是AbstractList的子类，因而也有一个modCount字段。在创建子列表时，modCount和父列表一致；以后每当通过子列表修改父列表时也都会保持一致。在调用子列表的方法时，类似于迭代器，首先也会通过checkForComodification方法确保父列表的结构没有发生改变，否则会抛出ConcurrentModificationException异常。

序列化

前面提到过数组 elementData 是使用 transient 来修饰的，这个其实就和序列化及反序列化相关。transient 是一个关键字，用 transient 修饰的变量不再是对象持久化的一部分，即默认序列化机制中该变量不用被序列化。

这一点可能让人很费解，如果不用被序列化，那么反序列化的时候不是就丢失了存储的数据了吗？实际上，在 ArrayList 中对序列化和反序列化过程进行了更细致的控制，即通过 writeObject()和 readObject() 方法。

/**
 * Save the state of the <tt>ArrayList</tt> instance to a stream (that
 * is, serialize it).
 *
 * @serialData The length of the array backing the <tt>ArrayList</tt>
 *             instance is emitted (int), followed by all of its elements
 *             (each an <tt>Object</tt>) in the proper order.
 */
private void writeObject(java.io.ObjectOutputStream s)
    throws java.io.IOException{
    // Write out element count, and any hidden stuff
    int expectedModCount = modCount;
    s.defaultWriteObject();

    // Write out size as capacity for behavioural compatibility with clone()
    s.writeInt(size);

    // Write out all elements in the proper order.
    for (int i=0; i<size; i++) {
        s.writeObject(elementData[i]);
    }

    if (modCount != expectedModCount) {
        throw new ConcurrentModificationException();
    }
}

/**
 * Reconstitute the <tt>ArrayList</tt> instance from a stream (that is,
 * deserialize it).
 */
private void readObject(java.io.ObjectInputStream s)
    throws java.io.IOException, ClassNotFoundException {
    elementData = EMPTY_ELEMENTDATA;

    // Read in size, and any hidden stuff
    s.defaultReadObject();

    // Read in capacity
    s.readInt(); // ignored

    if (size > 0) {
        // be like clone(), allocate array based upon size not capacity
        ensureCapacityInternal(size);

        Object[] a = elementData;
        // Read in all elements in the proper order.
        for (int i=0; i<size; i++) {
            a[i] = s.readObject();
        }
    }
}

可见，在序列化时并不是将整个数组全部写入输出流中，因为数组通常都不是处于完全填充的状态，对于为 null 的元素就不必保存，也可以达到节约空间的目的。后面我们会看到很多集合类中都采取了这种方式进行序列化和反序列化。

小结

本文通过源码分析了Java 8 集合框架中ArrayList的实现方式。ArrayList内部是通过数组进行实现的，具有高效的随机访问的特性；但插入和删除元素时往往需要复制数组，开销较大。在容器创建完成后需要进行大量访问，但插入和删除操作使用较少的情况下比较适合使用ArrayList。

posted @ 2017-08-15 17:27 _1900 阅读(535) 评论(0) 编辑收藏举报

刷新页面返回顶部

1900

Life is so short,do something to make yourself happy, such as coding.

Java 容器源码分析之 ArrayList

概览

底层结构

初始化

扩容

添加元素

移除元素

更新及查找

迭代

子列表

序列化

小结

公告